Figure from Allison Horst
We will will build upon our last lesson on ggplot101 which focused on an overall understanding of the grammar of graphics, basic syntax, adding data, aesthetic mappings, and geoms. Today we will focus on some of the other more commonly adjusted layers:
Before we get started, let’s load our libraries and data.
library(tidyverse)
library(gardenR)
And let’s remember whats in garden_harvest.
glimpse(garden_harvest)
## Rows: 781
## Columns: 5
## $ vegetable <chr> "lettuce", "radish", "lettuce", "lettuce", "radish", "lettuc…
## $ variety <chr> "reseed", "Garden Party Mix", "reseed", "reseed", "Garden Pa…
## $ date <date> 2020-06-06, 2020-06-06, 2020-06-08, 2020-06-09, 2020-06-11,…
## $ weight <dbl> 20, 36, 15, 10, 67, 12, 9, 8, 53, 19, 14, 10, 48, 58, 8, 121…
## $ units <chr> "grams", "grams", "grams", "grams", "grams", "grams", "grams…
Faceting allows to create small multiples of plots, enabling the easy comparison across the entirety of your data. A benefit of plots like this is they are all structured the same way, so once you understand one, you can begin to look at trends across groups/treatments/conditions simply and easily.
Here is a more infographic example of using small multiples.
Figure from Five Thirty Eight
So we can easily see that states with more of a maroon color have a lower than average life expectancy, while those that are higher than average are orange. We also can see easily where each state is on the map, so we can begin to understand how geography is related to life expectancy. We can also see which states have gotten better (i.e. their people live longer) with time, and those that haven’t. And this is all with a quick glance!
If we look back to the plot we were using as our example last week,
can see how we have a plot faceted by tomato variety.
First lets select only the data for tomatoes.
# filter data to include only tomatoes
# filter() is a useful function from dplyr (part of tidyverse)
# it allows us to select observations based on their values
garden_harvest_tomato <- garden_harvest %>%
filter(vegetable == "tomatoes")
Let’s remember what our base plot is currently looking like.
garden_harvest_tomato %>%
ggplot(aes(x = date, y = weight, color = variety)) +
geom_line() +
geom_point(size = 1)
See how crowded this is? I think faceting might help us better see our data by variety.
There are two functions that allow you to facet:
facet_wrap:
allows to lay out your facets in a wrapped type. You can use
facet_wrap if you have 1 variable you’d like to facet
on.facet_grid:
allows you to lay out your facets in a grid. You can use
facet_grid if you have 1 or 2 variables you’d like to facet
on.There are a few different sets of syntax that work for faceting, but I think this is the most intuitive.
garden_harvest_tomato %>%
ggplot(aes(x = date, y = weight)) +
geom_line() +
geom_point(size = 1) +
facet_wrap(vars(variety))
We will get a very reasonably different looking plot with
facet_grid with the default settings.
garden_harvest_tomato %>%
ggplot(aes(x = date, y = weight)) +
geom_line() +
geom_point(size = 1) +
facet_grid(vars(variety))
Note because you have provided only one variable, ggplot has put that facet in one row.
garden_harvest_tomato %>%
ggplot(aes(x = date, y = weight)) +
geom_line() +
geom_point(size = 1) +
facet_grid(cols = vars(variety))
We can make the faceting go by column, but this is also bad.
However, you might be thinking now that if you have two variables,
and you want to facet by the combination of them, you could do that with
facet_grid. Here is an example with the mpg
dataset from the tidyverse (since there isn’t really good data to
demonstrate this from garden_harevst).
mpg %>%
ggplot(aes(x = cty, y = hwy)) + # city and highway gas mileage
geom_point() +
facet_grid(cols = vars(class), # category of car
rows = vars(drv)) # type of drive train, 4 wheel, front, rear
The default in both facet_wrap and
facet_grid are for the x and y-axis to be fixed and
constant among all the plots. This is often what you want to take
advance of the comparisons between small multiples, but this is
something you can change if you want.
garden_harvest_tomato %>%
ggplot(aes(x = date, y = weight)) +
geom_line() +
geom_point(size = 1) +
facet_wrap(vars(variety), scales = "free")
Do note how this affects how easy it is to compare among the facets now.
Using scales allows you to control how the data are linked to the visual properties of your plot. Some books will include labels as a part of scales but I’m going to cover them separately.
Scales allow you to pick colors, shapes, alphas, lines, transformations (e.g. scaling your axes to a log scale), and others. You can also use scales to set the limits of your plots.
Scales functions start with scale_.
Here are some common things you might do with the scale_
functions.
Having good labels helps your reader (and you, when you come back to the plot in the future) understand what its all about.
In the labs() function, you can indicate:
x for the x-axis labely for the y-axis labeltitle for a titlesubtitle for a subtitle underneath your titlecaption for a captionIn theme() you can change characteristics of these
labels like their size, fonts, justfication, etc.
Themes will control all the non-data parts of your plot. There are
some pre-set “complete” themes that you can recognize as they’ll be
called theme_XXX(), and you can adjust any theme parameters
by setting parameters within theme(). There are probably 50
parameters you can set within theme() and they include text
size, axis label orientation, the presence of a legend, and many
others.
In class, we will practice using ggplot and adjusting facets, scales, labels, and themes.
You can find the in class content here.